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ABSTRACT 

This report explains how students with significant 
disabilities who submit portfolios for the Massachusetts Comprehensive 
Assessment System (MCAS) Alternate Assessments are evaluated to meet state 
requirements for a "competency determination" needed to receive a regular 
high school diploma. The reasoning behind the Massachusetts approach and the 
ways in which performance levels in each strand are combined to produce an 
overall performance level is described in this report. Student portfolios are 
based on "expanded" state standards that describe academic outcomes 
appropriate for students with significant disabilities. Portfolios contain an 
array of work samples, instructional data sheets, audio- and videotapes, or 
other evidence, organized into "portfolio strands." Portfolios are submitted 
to the state for scoring and designation of a performance level. Following an 
overview, the report examines the development of alternate assessment in 
Massachusetts. It then explains standards for counting scores to yield an 
overall performance level including the analytic rubric used to convert raw 
scores to performance levels. The final section considers standards for 
meeting the state's graduation requirements through MCAS Alternative 
Assessment. (DB) 
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Executive Summary 



In Massachusetts, about one percent of all students being assessed submit portfolios for the 
Massachusetts Comprehensive Assessment System (MCAS) Alternate Assessment. These port- 
folios are based on “expanded” state standards that describe academic outcomes appropriate for 
students with significant disabilities. Teachers collect “evidence” of their students’ performance 
on the standards during targeted instructional activities or structured student observations to 
create portfolios that contain an array of work samples, instructional data sheets, audio- and 
videotapes, or other evidence organized into “portfolio strands” in each content area. 

MCAS Alternate Assessments are submitted to the state for scoring and designation of a perfor- 
mance level that gives parents and teachers information on how well these students are learning 
the general curriculum relative to their past performance and the performance of other students. 
The process used by the Massachusetts Department of Education to assign performance levels 
to alternate assessments is the focus of this report. This technical phase, called standard setting, 
reflects several steps that typically occur between scoring and reporting. However, the process 
reflects theoretical debates and decisions that occurred much earlier in the development process 
of the alternate assessment, sometimes years before the first portfolio was compiled and sub- 
mitted. Several of these earlier conversations and their consequences are also described in this 
report since the recommendations form the philosophical basis of much that followed. 

The alternate assessment in Massachusetts is one pathway to meet the state requirements for earn- 
ing a “competency determination” needed to receive a regular high school diploma. Therefore, 
it was necessary to calibrate performance levels precisely between the alternate assessment and 
the general assessment, especially at the Needs Improvement level, which is the level required 
to earn the competency determination. Massachusetts decided to use an analytical rubric to 
convert raw scores to performance levels. Combinations of scores that could be obtained across 
the alternate assessment scoring rubrics for Level of Complexity, Demonstration of Knowledge 
and Skills, and Independence were discussed and reasoned perceptions were used to assign 
performance levels of Awareness, Emerging, Progressing, and Needs Improvement or above 
(Proficient, Advanced) in each portfolio strand. The reasoning behind the Massachusetts ap- 
proach and the ways in which performance levels in each strand are combined to produce an 
overall performance level is described further in the report. This approach reflects not only the 
Massachusetts standards, but also its unique culture and values. 
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Overview 



States are finding different ways to adapt their accountability systems to include all students, 
because the achievement of students with disabilities has typically lagged behind that of their 
non-disabled peers, and because recent state and federal laws require the participation of all 
students. Special educators are considering how , rather than whether, students with disabilities 
will participate in statewide assessments, while assessment policies themselves have become 
more flexible in accommodating the administration of those tests. Curriculum experts are placing 
increased emphasis on teaching students with disabilities the same content and skills being taught 
to their non-disabled peers, while regular and special educators are working together to adapt 
curriculum and instruction so diverse learners may participate more fully in academic activities. 

A comparatively small number of students with the most complex and significant disabilities, 
though, have been more difficult to include in statewide assessments. Academic skills and sub- 
ject matter have not always been a part of the curriculum for this population, and information 
has not systematically been collected on what these students have learned. The performance 
of these students is not easily determined using the same standardized paper-and-pencil tests 
used with the majority of students, but since participation in these assessments is now re- 
quired, states have had to decide how best to include these students by giving them “alternate 
assessments.” Alternate assessment methods and formats are determined by each state indi- 
vidually, though their common purpose is to improve instruction for these students and report 
their academic performance. By using alternate assessments with this population, schools 
can document what is being taught for purposes of system accountability, and demonstrate 
to parents and the public to what degree each of these students has learned state standards. 

A majority of states have adopted individual academic portfolios as the most effective 
method of assessment for these “difficult-to-assess” students. Student portfolios accom- 
modate a range of approaches to document learning, and afford teachers options for de- 
termining the ideal time, place, and method to assess their students. Portfolios provide 
teachers, students, and their parents with tangible evidence of student performance and 
feedback on their progress. While the contents of each is unique, their structure allows for 
evaluation and scoring using uniform criteria that can be shared with teachers beforehand. 

The demands of creating and managing portfolios, and compiling this information for submis- 
sion to the state, however, requires additional expertise on the part of teachers and time in which 
to complete this work. This fundamental change in classroom practice has required states to 
make a strong and continued commitment to provide professional development and technical 
assistance to educators who conduct alternate assessments, and to engage in an open dialogue 
about the efficiency, rigor, and usefulness of the process with those who are most affected by it. 
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Alternate Assessment in Massachusetts 



In Massachusetts, about 5,000 students, or one percent of all students being assessed, submit 
portfolios for the Massachusetts Comprehensive Assessment System (MCAS) Alternate Assess- 
ment. In creating portfolios, their teachers must first identify challenging outcomes for each 
student based on the standards in each content area being assessed. Many states, including Mas- 
sachusetts, use an “expanded” version of their standards that describes academic outcomes that 
are appropriate for students with significant disabilities. Teachers then collect “evidence” of their 
students’ performance on those standards during targeted instructional activities or structured 
student observations. Portfolios may contain an array of work samples, instructional data sheets, 
audio- and videotapes, and other evidence organized into “portfolio strands” in each content area. 

Once MCAS Alternate Assessments are submitted to the state, these are scored and a perfor- 
mance level assigned in each content area so parents and teachers have information on how 
well these students are learning the general curriculum relative to their past performance and 
the performance of other students. The process used by the Massachusetts Department of Edu- 
cation to assign performance levels to alternate assessments is the focus of this report. This 
technical phase, called standard setting, reflects several steps that typically occur between 
scoring and reporting (Quenemoen, Rigney, & Thurlow, 2002). However, the process reflects 
theoretical debates and decisions that occurred much earlier in the development process of 
the alternate assessment, sometimes years before the first portfolio was compiled and submit- 
ted. Several of these earlier conversations and their consequences are also described in this 
report since the recommendations form the philosophical basis of much that followed. First 
among these conceptual discussions was defining who should take an alternate assessment. 



A Diverse Group of Advisors 

Late in 1998, the Massachusetts Department of Education began convening regular task force 
meetings comprised of DOE staff (from Special Education and Assessment units), the con- 
tractor team (Measured Progress and the ILSSA group at the University of Kentucky), and 
the Massachusetts Alternate Assessment Advisory Committee (a diverse stakeholders group) 
who provided recommendations to the Department on a range of assessment issues, including: 

• how to provide guidance to IEP teams about which students to consider for alternate 
assessments; 

• what alternate assessments should look like; 

• how alternate assessments should be scored; 
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• which scores should “count” toward overall performance; and 

• how to describe and report the performance of students who take alternate assess- 
ments. 



Guidelines for IEP Teams: Who Should Take Alternate Assessments? 

It was assumed from the beginning that students who needed alternate assessments were, for the 
most part, those who could not take paper-and-pencil tests and whose academic performance was 
based on the expanded standards appropriate for students with significant disabilities. However, 
the task force also identified students whose disabilities were not primarily cognitive whom they 
felt should also be considered for alternate assessments by their IEP and 504 teams. Generally, 
this smaller group of identified students had disabilities that presented them with “unique and 
significant challenges” to participation in standardized statewide testing regardless of the accom- 
modations they could use on those tests. They recommended, for example, that students with 
severe behavioral and emotional disabilities, or those with cerebral palsy, sensory impairments 
(deaf, blind, or deaf and blind), or fragile health and medical conditions should also be considered 
for alternate assessments, regardless of their levels of academic performance since taking on-de- 
mand statewide tests could present them with insurmountable barriers to their participation, and 
therefore deny them access to the assessment (Massachusetts Department of Education, 1999). 

Based on guidelines provided to Massachusetts IEP Teams since 1999, students across the 
full spectrum of academic performance, then, are eligible to take alternate assessments, even 
when they are able to demonstrate the same (or higher) levels of performance as a tested stu- 
dent. They simply require an alternate assessment format to demonstrate their knowledge and 
skills. Therefore, the MCAS reporting system required sufficient flexibility and integrity to 
provide meaningful feedback on students who demonstrate a “comparable performance” to a 
student who scores at the highest levels on the standard tests. It also became necessary to in- 
corporate a method by which a student could meet the state’s graduation requirement through 
an alternate assessment. The task force strongly advised that the alternate assessment be a dif- 
ferent, though not easier, pathway to demonstrate the same performance as a tested student. 



Scoring Alternate Assessments 

The task force next considered and selected criteria on which to base the scores of alternate 
assessment portfolios. They advised the Department to develop criteria based primarily on 
student performance, since that is what the standard assessment measured, rather than assess- 
ing how well the student’s program provided opportunities to learn this material. Some on 
the task force, however, felt that student achievement could not be separated from program 
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effectiveness. In the end, a scoring rubric was developed in which four out of six categories 
are based on student performance, and two reflect the effectiveness of the student’s program: 

• Completeness of the portfolio 

• Level of complexity: the difficulty of academic tasks and knowledge attempted by the 
student 

• Demonstration of Skills and Concepts: the accuracy of the student’s performance 

• Independence: cues, prompts, and other assistance required by the student to perform 
the tasks or activities 

• Self-evaluation: the extent to which opportunities are provided to reflect, set goals, 
evaluate, and monitor the student’s own performance 

• Generalized Performance: the number of contexts and instructional approaches pro- 
vided to the student to perform tasks and demonstrate knowledge 

Scores are determined and reported in each of the rubric areas listed above. Once numeri- 
cal scores are obtained for a portfolio in these rubric areas, raw scores must somehow be 
combined to identify an overall performance level in the content area. Before perfor- 
mance levels can be determined, however, several important questions must be answered: 

• What will each performance level be called; how many performance levels will there 
be; and how will each be defined? 

• Which numerical scores in which rubric areas will be counted in determining the 
overall performance level? 

• How will numerical scores in those rubric areas be combined to yield a performance 
level? 

• What range or combination of scores will yield a particular performance level? 



Defining Performance Levels 

The task force recommended that performance levels be identical to performance levels on 
standard MCAS tests; but that the lowest performance level, called “Warning/Failing at Grade 
10” for tested students, would be sub-divided into three distinct levels in order to provide 
more meaningful descriptions of performance at these lower levels. Figure 1 illustrates the 
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performance levels and definitions used by Massachusetts to report assessment results on the 
standard and the alternate assessments, and the relationship between the two reporting scales. 



Figure 1 . MCAS Performance Levels 
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Counting Scores Toward an Overall Performance Level ““ 

On several occasions, the task force revisited the question of which scores to count in calculating 
the overall level of performance. In reviewing the goals, methods, and purpose of the general 
assessment, they realized, in essence, that regular MCAS tests measure the ability of a student to 
respond to test items accurately, with no assistance from peers or from the adult(s) administer- 
ing the test, and that test results are based solely on the correctness of the student’s responses. 

In the end, their recommendation was to “parallel the goals, methods, and purpose of the general 
assessment, where possible,” when no other solution is obvious. With this advice, the task force 
established a foundation for future decision-making, and returned to this guidance frequently. 

With these assumptions about the general assessment, and the advice of the task force to parallel 

5 

1 1 BEST COPY AVAILABLE 



ERIC 



sICEO 



the general assessment where possible, the Department decided it would base alternate assess- 
ment performance levels on raw numerical portfolio scores given in the areas of completeness, 
complexity, accuracy, and independence only; but not on self-evaluation or generalized per- 
formance, since scores in these last two areas depended on opportunities provided to the student, 
not on the student’s direct performance of the skill being assessed. Scores in all rubric areas, 
however, would be reported to schools and parents in order to provide those who work most 
closely with the student detailed information on his or her performance as shown in Figure 2. 

Separate scores are reported for each strand in Level of Complexity, Demonstration of 
Skills and Concepts (accuracy), and Independence, while scores in the secondary areas of 
Self-Evaluation and Generalized Performance are combined for the entire content area. 



How Will Numerical Scores be Combined to Yield a Performance Level? 

The Massachusetts Department of Education consulted with Ed Roeber of Measured 
Progress to assist in developing a strategy or formula for combining scores to obtain an 
overall performance level for each content area. Over time. Dr. Roeber recommended 
several options for calculating a numerical score total in each content area of a port- 
folio. The following were two mathematical formulas considered by the Department: 

Method #1 - Calculate the sum of scores in three rubric areas: 

LC + DSC + Ind = Total Score 

Method #2 - Multiply LC by the sum of the other two rubric areas: 

LC x (DSC + Ind) = Total Score 



Kev 

LC = Level of Complexity 

DSC = Demonstration of Skills and Concepts 

Ind = Independence 

Consider the following scenario using both scoring methods: 



Student A 
Raw Scores: 

LC=3 

DSC=3 

1=3 

Student A Total Score (Method #1) = 9 
Student A Total Score (Method #2) = 18 



Student B 
Raw Scores: 

LC=2 

DSC=4 

1=4 

Student B Total Score (Method #1) = 10 
Student B Total Score (Method #2) = 16 
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Figure 2. Excerpt of Sample Parent/Guardian Report 
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Using Method #1, Student A scored lower (9) than Student B (10), although Student A 
worked on more challenging subject matter (LC=3) than Student B (LC=2). Using Method 
#2, on the other hand. Student A scored higher (18) than Student B (16), thereby reward- 
ing Student A for attempting more challenging material. For certain score combinations. 
Method #1 appeared to create a disincentive for students to attempt increasingly com- 
plex skills and content, and discouraged teachers from providing more challenging in- 
struction to their students, which was certainly not the intent of the alternate assessment. 

Because the LC score is used as a multiplier in Method #2, scores also were spread over a wider 
range (1-40), avoiding the possibility of overlapping totals. Method #1, on the other hand, spreads 
scores across a narrow range (1-13) since scores are simply added together. It was agreed that Meth- 
od #2 would be explored further for its effectiveness, impact, and unintended consequences, if any. 

Combining Scores to Yield a Performance Level 

Dr. Roeber suggested that MCAS-Alt project leadership meet with regular assessment psy- 
chometricians and data analysts from the Department and from Measured Progress to review 
and select the most effective formula for calculating a total content area score, and to identify 
“cut scores” for specific performance levels based on a range of calculated score totals. During 
ensuing discussions, however, questions were raised about the necessity of generating a single 
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total numerical score for each strand and content area in the alternate assessment, and whether 
it might cause confusion to introduce another, entirely different score scale beside the 200-280 
score scale already in use for MCAS test results. Some felt this would reinforce the separateness 
of the alternate assessment and wondered instead whether a system could be developed that used 
reasoned judgment, instead of a calculation, to describe overall student performance based on 
different raw score combinations. After a spirited discussion, this reasoning prevailed, and the idea 
of calculating a total numerical portfolio score was abandoned in favor of a different approach. 

Whether a mathematical equation or a reasoned approach is used to determine a student’s perfor- 
mance, however, some kind of scale, analytical rubric, or other consistent method must be used to 
convert raw scores to performance levels (Roeber, 2002). The analytical rubric developed for this 
purpose in Massachusetts is actually a series of grids based on a student’s score as shown in Figure 3. 

Sixty-four different possible score combinations were discussed and analyzed by the group, and a 
performance level identified by consensus for each. Decisions were based on reasoned perceptions 
of what each score combination revealed about the student’s performance, and the relative position 
of that performance level within the hierarchy of other levels. It was easier to analyze and assign 
performance levels beginning with the lowest and highest levels, then working toward the middle. 
In the end, the group was able to define and categorize all score combinations. The model was tested 
using various arbitrary score combinations to check that the defined performance level made sense, 
given the student’s scores, and that scores were appropriately scaled relative to adjacent scores. 

An analysis of several arbitrary score combinations reveals, for example, that a student who 
scores LC=3, DSC=2, and Ind=3 according to the MCAS-Alt scoring rubric, is a student 
who is working on modified (or “expanded”) learning standards, who demonstrates 26-50% 
accuracy, and who needs assistance 51-75% of the time during standards-based activities 
(Massachusetts Department of Education, 2001). From this information, the student would 
appear to be performing above the definition of Awareness in this content area, but not yet at 
Progressing, in which the student would perform the skills and demonstrate the knowledge 
with greater independence and accuracy. Since this student is somewhere between the Aware- 
ness and Progressing performance levels, we can say with relative confidence that the student 
is at the Emerging level. Another student who hypothetically scored LC=3, DSC=3, Ind=4 is 
also working on modified standards, but performs with a sufficiently high rate of accuracy and 
independence to be placed in the Progressing performance level. He or she is probably ready 
to attempt even more challenging tasks, skills, and concepts in the coming year, since the 
data suggest he or she has mastered skills and content in the current portfolio. Figure 3 shows 
the complete analytical rubric for determining performance levels in each portfolio strand. 
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Figure 3. Analytical Rubric for Determining Performance Levels in Each Portfolio Strand 
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Aw = Awareness 
Em = Emerging 
Pg = Progressing 

NI+= Either Needs Improvement, Proficient, or Advanced, as determined by an expert panel 
reviewing these portfolios 



Calculating the Overall Performance Level 

Once performance levels are determined for each of three required poitfolio strands in the content 
area, based on the analytical rubric shown in Figure 3, these are averaged and rounded to the 
nearest whole number to determine the overall performance level in that subject. To calculate 
the average of three performance levels, consecutive numerical values are given to each perfor- 
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mance level, as follows: Awareness = 1 , Emerging = 2, Progressing = 3, Needs Improvement = 4, 
etc. Figure 4 shows how different combinations are averaged to yield a final performance level. 

Figure 4. Performance Levels in Each Strand are Averaged to Determine an Overall 
Performance Level 
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Meeting the State’s Graduation Requirement Through MCAS 
Alternate Assess ment ■ 

A performance level of Needs Improvement or higher is required on grade 10 MCAS assess- 
ments in English Language Arts and Mathematics in order to earn a “competency determina- 
tion” (the state’s requirement to receive a regular high school diploma). As previously stated, 
alternate assessment is one pathway to meet that requirement. Therefore, it is necessary to 
calibrate performance levels precisely between the alternate assessment and the general assess- 
ment, especially at the Needs Improvement level. What does a Needs Improvement portfolio 
look like, and what specifically constitutes a “comparable performance” to a student who was 
tested and earned this score? Although portfolio scorers can accurately determine a portfolio’s 
completeness, accuracy, and independence of performance, an additional level of review 
seemed necessary in order to assure the breadth, quality, and comparability of the student’s 
performance to that of other students who passed the grade 10 MCAS tests in those subjects. 

To accomplish this, the Department convenes a panel of math and English language arts content 
specialists each year to review a selection of grade 1 0 portfolios set aside for this purpose, and to 
make recommendations to the Department on whether these students have demonstrated achieve- 
ment at or above Needs Improvement level based on the evidence in their portfolios. Panelists, 
themselves, were selected by the Department for their secondary-level teaching expertise in 
the content area; their experience serving on the state’s Assessment Development Committees 
that develop and review general assessment test items with the state’s test contractor; and their 
extensive familiarity with Massachusetts Curriculum Frameworks. Panelists are familiar with 
work typical of students who “passed” the grade 10 MCAS tests in ELA and Mathematics 
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since they teach these students on a daily basis. Panel members were asked to examine pre- 
scored portfolios at Level of Complexity 4 and 5, and to verify whether they felt the contents: 

• document the full range of learning standards, covering knowledge and skills tested 
on grade 10 MCAS tests in the content area; 

• demonstrate a level of performance typical of students who perform at the Needs Im- 
provement level on the MCAS test in that subject; and 

• exemplify an even higher performance level than Needs Improvement; for example. 
Proficient or Advanced. 



Although the number of students each year who perform at or above the Needs Improvement level 
on grade 10ELA and Math alternate assessments is relatively small, this number can be expected to 
grow over time. Of course, as teachers also gain familiarity with portfolio management techniques, 
submission requirements, curriculum alignment, and instructional improvements, the scores of 
all students will rise. It is important for states to demonstrate the effectiveness of their statewide 
alternate assessments to improve the nature of instruction for students with significant disabilities 
generally, and to show that these improvements translate into expanded opportunities for these 
students both in and out of school. It is also important to demonstrate the capacity of the alternate 
assessment to assist students to meet the same important scholastic requirements as other students. 

Developing a statewide alternate assessment presents states with a range of difficult choices, 
such as how to determine participation, measure performance, and report results. The demand for 
professional development and technical assistance required by such a system can be intensive, and 
there must be an ongoing commitment by state assessment personnel to maintain communica- 
tion and accessibility with the public. In the end, each state must ultimately develop an alternate 
assessment that reflects not only its standards, but its unique culture and values that is integrated 
with the standard assessment system, and that promotes the greatest benefits to the most students. 
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